Adapted from the course material provided by the Institute for Systems Biology, Seattle, Washington, USA.
Load required libraries.
# 0.1. Load required packages for today
library(RColorBrewer) # 1. library to access easily multiple colors for plotting
library(Rtsne) # 2. R implementation of the t-SNE algorithm
library(tictoc) # 3. library to profile execution time
library(Seurat) # 4. library to single-cell analysis
## Registered S3 method overwritten by 'R.oo':
## method from
## throw.default R.methodsS3
library(magrittr) # 5. library for introducing pipe syntax: https://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html
library(dplyr) # 6. useful to manipulate data frames
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Working on the guided clustering tutorial from Seurat.
We are analyzing a very commonly used dataset of peripheral blood mononuclear cells (PBMC) freely available from 10X Genomics. The experiment yield 2,700 single cell transcriptomes, sequenced on the Illumina NextSeq 500. Data originally retrieved from here. Originally retrieved from: https://s3-us-west-2.amazonaws.com/10x.files/samples/cell/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz
Once the tar file is downloaded, you can use the following command to load the data into R:
pbmc.data=Read10X(data.dir="~/scratch/filtered_gene_bc_matrices/hg19/")
We then saved the data into and R object for convenient loading in this course using the command:
save(pbmc.data, file = "~/scratch/pbmc.RData")
So now, let’s load the data and start the exploration!
load("data/pbmc.RData")
Initialize the Seurat object with the raw (non-normalized data). Keep all genes expressed in >= 3 cells (~0.1% of the data). Keep all cells with at least 200 detected genes.
pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
pbmc
## An object of class Seurat
## 13714 features across 2700 samples within 1 assay
## Active assay: RNA (13714 features)
In addition to the number of genes detected, the community typically filters out cells with high percentage of mapping reads to the mitochondrial genome. This is due to low-quality / dying cells often exhibit extensive mitochondrial contamination. Mitochondrial QC metrics are computed with the PercentageFeatureSet function, which calculates the percentage of counts originating from a set of features (mitochondrial genes). All genes starting with MT- as a set of mitochondrial genes
Info: The [[ operator can add columns to object metadata. This is a great place to stash QC stats.
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
Visualize QC metrics as a violin plot.
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
FeatureScatter is typically used to visualize feature-feature relationships, but can be used for anything calculated by the object, i.e. columns in object metadata, PC scores etc.
plot1 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
CombinePlots(plots = list(plot1, plot2))
Let’s get the high quality transcriptomes by filtering out cells that have unique gene counts over 2,500 or less than 200 genes.
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)
We can normalize expression data by a global-scaling normalization method LogNormalize that normalizes the feature expression measurements for each cell by the total expression, multiplies this by a scale factor (10,000 by default), and log-transforms the result.
pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
Next, we will retrieve genes with high cell-to-cell variation across the data set. In this case we will consider both. Focusing on these genes in downstream analysis helps to highlight biological signal in single-cell datasets.
Let’s use the FindVariableFeatures function that directly models the mean-variance relationship inherent in single-cell data.
pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
# Identify the 10 most highly variable genes
top10 <- head(VariableFeatures(pbmc), 10)
# plot variable features with and without labels
plot1=VariableFeaturePlot(pbmc)
LabelPoints(plot = plot1, points = top10, repel = TRUE)
## Warning: Using `as.character()` on a quosure is deprecated as of rlang 0.3.0.
## Please use `as_label()` or `as_name()` instead.
## This warning is displayed once per session.
## When using repel, set xnudge and ynudge to 0 for optimal results
## Warning: Transformation introduced infinite values in continuous x-axis
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
## Centering and scaling data matrix
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
## PC_ 1
## Positive: CST3, TYROBP, LST1, AIF1, FTL, FTH1, LYZ, FCN1, S100A9, TYMP
## FCER1G, CFD, LGALS1, S100A8, CTSS, LGALS2, SERPINA1, IFITM3, SPI1, CFP
## PSAP, IFI30, SAT1, COTL1, S100A11, NPC2, GRN, LGALS3, GSTP1, PYCARD
## Negative: MALAT1, LTB, IL32, IL7R, CD2, B2M, ACAP1, CD27, STK17A, CTSW
## CD247, GIMAP5, AQP3, CCL5, SELL, TRAF3IP3, GZMA, MAL, CST7, ITM2A
## MYC, GIMAP7, HOPX, BEX2, LDLRAP1, GZMK, ETS1, ZAP70, TNFAIP8, RIC3
## PC_ 2
## Positive: CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1, HLA-DRA, LINC00926, CD79B, HLA-DRB1, CD74
## HLA-DMA, HLA-DPB1, HLA-DQA2, CD37, HLA-DRB5, HLA-DMB, HLA-DPA1, FCRLA, HVCN1, LTB
## BLNK, P2RX5, IGLL5, IRF8, SWAP70, ARHGAP24, FCGR2B, SMIM14, PPP1R14A, C16orf74
## Negative: NKG7, PRF1, CST7, GZMB, GZMA, FGFBP2, CTSW, GNLY, B2M, SPON2
## CCL4, GZMH, FCGR3A, CCL5, CD247, XCL2, CLIC3, AKR1C3, SRGN, HOPX
## TTC38, APMAP, CTSC, S100A4, IGFBP7, ANXA1, ID2, IL32, XCL1, RHOC
## PC_ 3
## Positive: HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1, HLA-DPA1, CD74, MS4A1, HLA-DRB1, HLA-DRA
## HLA-DRB5, HLA-DQA2, TCL1A, LINC00926, HLA-DMB, HLA-DMA, CD37, HVCN1, FCRLA, IRF8
## PLAC8, BLNK, MALAT1, SMIM14, PLD4, LAT2, IGLL5, P2RX5, SWAP70, FCGR2B
## Negative: PPBP, PF4, SDPR, SPARC, GNG11, NRGN, GP9, RGS18, TUBB1, CLU
## HIST1H2AC, AP001189.4, ITGA2B, CD9, TMEM40, PTCRA, CA2, ACRBP, MMD, TREML1
## NGFRAP1, F13A1, SEPT5, RUFY1, TSC22D1, MPP1, CMTM5, RP11-367G6.3, MYL9, GP1BA
## PC_ 4
## Positive: HLA-DQA1, CD79B, CD79A, MS4A1, HLA-DQB1, CD74, HLA-DPB1, HIST1H2AC, PF4, TCL1A
## SDPR, HLA-DPA1, HLA-DRB1, HLA-DQA2, HLA-DRA, PPBP, LINC00926, GNG11, HLA-DRB5, SPARC
## GP9, AP001189.4, CA2, PTCRA, CD9, NRGN, RGS18, GZMB, CLU, TUBB1
## Negative: VIM, IL7R, S100A6, IL32, S100A8, S100A4, GIMAP7, S100A10, S100A9, MAL
## AQP3, CD2, CD14, FYB, LGALS2, GIMAP4, ANXA1, CD27, FCN1, RBP7
## LYZ, S100A11, GIMAP5, MS4A6A, S100A12, FOLR3, TRABD2A, AIF1, IL8, IFI6
## PC_ 5
## Positive: GZMB, NKG7, S100A8, FGFBP2, GNLY, CCL4, CST7, PRF1, GZMA, SPON2
## GZMH, S100A9, LGALS2, CCL3, CTSW, XCL2, CD14, CLIC3, S100A12, CCL5
## RBP7, MS4A6A, GSTP1, FOLR3, IGFBP7, TYROBP, TTC38, AKR1C3, XCL1, HOPX
## Negative: LTB, IL7R, CKB, VIM, MS4A7, AQP3, CYTIP, RP11-290F20.3, SIGLEC10, HMOX1
## PTGES3, LILRB2, MAL, CD27, HN1, CD2, GDI2, ANXA5, CORO1B, TUBA1B
## FAM110A, ATP1A1, TRADD, PPA1, CCDC109B, ABRACL, CTD-2006K23.1, WARS, VMO1, FYB
Visualize PC loadings.
VizDimLoadings(pbmc, dims = 1:2, reduction = "pca")
Visualize cells in the first two PCs.
DimPlot(pbmc, reduction = "pca")
Heatmap of expression for the 15 genes with stronger loadings.
DimHeatmap(pbmc, dims = 1, cells = 500, balanced = TRUE)
Iterate heatmap visualization across 10 components.
DimHeatmap(pbmc, dims = 1:9, cells = 500, balanced = TRUE)
ElbowPlot(pbmc)
pbmc <- FindNeighbors(pbmc, dims = 1:10)
## Computing nearest neighbor graph
## Computing SNN
pbmc <- FindClusters(pbmc, resolution = 0.5)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 2638
## Number of edges: 96033
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8720
## Number of communities: 9
## Elapsed time: 0 seconds
We can use UMAP to perform dimensionality reduction.
# if RunUMAP fails, most likely it is because UMAP is missing. Run the command line "pip install umap-learn" in a your termina and try again
pbmc <- RunUMAP(pbmc, dims = 1:10)
DimPlot(pbmc, reduction = "umap")
# alternatively, we can use t-SNE
#pbmc=RunTSNE(pbmc,dims=1:25)
#TSNEPlot(pbmc) # add file='figure.pdf' if you have some Quartz related errors.
Find all markers of cluster 1.
cluster1.markers <- FindMarkers(pbmc, ident.1 = 1, min.pct = 0.25)
head(cluster1.markers, n = 5)
## p_val avg_logFC pct.1 pct.2 p_val_adj
## IL32 1.894810e-92 0.8373872 0.948 0.464 2.598542e-88
## LTB 7.953303e-89 0.8921170 0.981 0.642 1.090716e-84
## CD3D 1.655937e-70 0.6436286 0.919 0.431 2.270951e-66
## IL7R 3.688893e-68 0.8147082 0.747 0.325 5.058947e-64
## LDHB 2.292819e-67 0.6253110 0.950 0.613 3.144372e-63
Find all markers distinguishing cluster 5 from clusters 0 and 3
cluster5.markers <- FindMarkers(pbmc, ident.1 = 5, ident.2 = c(0, 3), min.pct = 0.25)
head(cluster5.markers, n = 5)
## p_val avg_logFC pct.1 pct.2 p_val_adj
## FCGR3A 7.583625e-209 2.963144 0.975 0.037 1.040018e-204
## IFITM3 2.500844e-199 2.698187 0.975 0.046 3.429657e-195
## CFD 1.763722e-195 2.362381 0.938 0.037 2.418768e-191
## CD68 4.612171e-192 2.087366 0.926 0.036 6.325132e-188
## RP11-290F20.3 1.846215e-188 1.886288 0.840 0.016 2.531900e-184
Find markers for every cluster compared to all remaining cells, report only the positive ones
pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
## Calculating cluster 0
## Calculating cluster 1
## Calculating cluster 2
## Calculating cluster 3
## Calculating cluster 4
## Calculating cluster 5
## Calculating cluster 6
## Calculating cluster 7
## Calculating cluster 8
pbmc.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
## # A tibble: 18 x 7
## # Groups: cluster [9]
## p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene
## <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <chr>
## 1 1.96e-107 0.730 0.901 0.594 2.69e-103 0 LDHB
## 2 1.61e- 82 0.922 0.436 0.11 2.20e- 78 0 CCR7
## 3 7.95e- 89 0.892 0.981 0.642 1.09e- 84 1 LTB
## 4 1.85e- 60 0.859 0.422 0.11 2.54e- 56 1 AQP3
## 5 0. 3.86 0.996 0.215 0. 2 S100A9
## 6 0. 3.80 0.975 0.121 0. 2 S100A8
## 7 0. 2.99 0.936 0.041 0. 3 CD79A
## 8 9.48e-271 2.49 0.622 0.022 1.30e-266 3 TCL1A
## 9 2.96e-189 2.12 0.985 0.24 4.06e-185 4 CCL5
## 10 2.57e-158 2.05 0.587 0.059 3.52e-154 4 GZMK
## 11 3.51e-184 2.30 0.975 0.134 4.82e-180 5 FCGR3A
## 12 2.03e-125 2.14 1 0.315 2.78e-121 5 LST1
## 13 7.95e-269 3.35 0.961 0.068 1.09e-264 6 GZMB
## 14 3.13e-191 3.69 0.961 0.131 4.30e-187 6 GNLY
## 15 1.48e-220 2.68 0.812 0.011 2.03e-216 7 FCER1A
## 16 1.67e- 21 1.99 1 0.513 2.28e- 17 7 HLA-DPB1
## 17 7.73e-200 5.02 1 0.01 1.06e-195 8 PF4
## 18 3.68e-110 5.94 1 0.024 5.05e-106 8 PPBP
The ROC test returns the ‘classification power’ for any individual marker (ranging from 0 - random, to 1 - perfect).
cluster1.markers <- FindMarkers(pbmc, ident.1 = 0, logfc.threshold = 0.25, test.use = "roc", only.pos = TRUE)
head(cluster1.markers)
## myAUC avg_diff power pct.1 pct.2
## RPS12 0.822 0.5029988 0.644 1.000 0.991
## RPS27 0.822 0.5020359 0.644 0.999 0.992
## RPS6 0.820 0.4673635 0.640 1.000 0.995
## RPL32 0.815 0.4242773 0.630 0.999 0.995
## RPS14 0.807 0.4283480 0.614 1.000 0.994
## RPS25 0.802 0.5203411 0.604 0.997 0.975
Visualize markers expression distributions.
VlnPlot(pbmc, features = c("MS4A1", "CD79A"))
Visualize markers expression distributions using raw counts.
VlnPlot(pbmc, features = c("MS4A1", "CD79A"),slot = "counts", log = TRUE)
Visualize markers expression in the embedded space.
#FeaturePlot(pbmc,features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP", "CD8A"))
FeaturePlot(pbmc,features = c("LDHB", "LTB", "S100A9", "CD79A", "CCL5", "FCGR3A", "GZMB", "FCER1A","PF4"))
Visualize markers as a heatmap
top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
DoHeatmap(pbmc, features = top10$gene) + NoLegend()
Using literature we can associate a cell type to identified clusters via gene markers.
new.cluster.ids <- c("Naive CD4 T", "Memory CD4 T", "CD14+ Mono", "B", "CD8 T", "FCGR3A+ Mono",
"NK", "DC", "Platelet")
names(new.cluster.ids) <- levels(pbmc)
pbmc <- RenameIdents(pbmc, new.cluster.ids)
DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5,file='figure.final.pdf') + NoLegend()
This expression data, as the ones you will be working with, they are already pre-processed. No need to filter for mitochondrial counts.
load("data/malignantCells.2k.RData")
str(malignantCellsExpression[1:5,1:5])
## 'data.frame': 5 obs. of 5 variables:
## $ RPS11 : num 7.89 8.33 7.83 7.6 8.13
## $ DECR1 : num 0 3.3 0 0 4.14
## $ RPS18 : num 9.62 9.98 10.3 9.58 10.34
## $ SUMO1 : num 0.747 5.154 5.179 4.247 4.58
## $ UQCR11: num 5.6 4.69 5.38 2.55 4.65
results=prcomp(malignantCellsExpression)
plot(results$x[,'PC1'],results$x[,'PC2'],main='PCA of malignant cells',pch=19,xlab='PC 1',ylab='PC 2')
Associate patient metadata to plotting colors.
plottingColors=brewer.pal(length(unique(malignantCellsPatientMetadata)),'Dark2')
names(plottingColors)=unique(malignantCellsPatientMetadata)
Inspect variables.
str(malignantCellsPatientMetadata)
## chr [1:1061] "Mel81" "Mel80" "Mel81" "Mel80" "Mel81" "Mel81" "Mel80" ...
str(unique(malignantCellsPatientMetadata))
## chr [1:6] "Mel81" "Mel80" "Mel79" "Mel78" "Mel88" "Mel89"
str(plottingColors)
## Named chr [1:6] "#1B9E77" "#D95F02" "#7570B3" "#E7298A" "#66A61E" ...
## - attr(*, "names")= chr [1:6] "Mel81" "Mel80" "Mel79" "Mel78" ...
str(plottingColors[malignantCellsPatientMetadata])
## Named chr [1:1061] "#1B9E77" "#D95F02" "#1B9E77" "#D95F02" "#1B9E77" ...
## - attr(*, "names")= chr [1:1061] "Mel81" "Mel80" "Mel81" "Mel80" ...
Plot again with patient metadata as different colors.
plot(results$x[,'PC1'],results$x[,'PC2'],main='PCA of malignant cells',col=plottingColors[malignantCellsPatientMetadata],pch=19,xlab='PC 1',ylab='PC 2')
legend('bottomright',legend=unique(malignantCellsPatientMetadata),fill=plottingColors)
results2D=Rtsne(malignantCellsExpression,dims=2,perplexity=50,theta=0)
plot(results2D$Y,main='tSNE of malignant cells, p=50',col=plottingColors[malignantCellsPatientMetadata],pch=19,xlab='tSNE Component 1',ylab='tSNE Component 2')
legend('topright',legend=unique(malignantCellsPatientMetadata),fill=plottingColors)
We now look at single-cell RNA-Seq data from malignant and non-malignant melanoma cells.
load("data/malignantCells.2k.RData")
str(malignantCellsExpression[1:5,1:5])
## 'data.frame': 5 obs. of 5 variables:
## $ RPS11 : num 7.89 8.33 7.83 7.6 8.13
## $ DECR1 : num 0 3.3 0 0 4.14
## $ RPS18 : num 9.62 9.98 10.3 9.58 10.34
## $ SUMO1 : num 0.747 5.154 5.179 4.247 4.58
## $ UQCR11: num 5.6 4.69 5.38 2.55 4.65
load("data/nonMalignantCells.2k.RData")
str(nonMalignantCellsExpression[1:5,1:5])
## 'data.frame': 5 obs. of 5 variables:
## $ RPS11 : num 7.86 9.18 9.32 8.16 8.15
## $ TRAF3IP2-AS1: num 2.15 2.001 2.104 0 0.797
## $ RPS18 : num 9.51 6.22 0 8.46 9.37
## $ SUMO1 : num 0 0 7.13 5.5 0
## $ UQCR11 : num 3.21 0 7.14 2.97 0
results=prcomp(malignantCellsExpression)
plot(results$x[,'PC1'],results$x[,'PC2'],main='PCA of malignant cells',pch=19,xlab='PC 1',ylab='PC 2')
plottingColors=brewer.pal(length(unique(malignantCellsPatientMetadata)),'Dark2')
names(plottingColors)=unique(malignantCellsPatientMetadata)
str(malignantCellsPatientMetadata)
## chr [1:1061] "Mel81" "Mel80" "Mel81" "Mel80" "Mel81" "Mel81" "Mel80" ...
str(unique(malignantCellsPatientMetadata))
## chr [1:6] "Mel81" "Mel80" "Mel79" "Mel78" "Mel88" "Mel89"
str(plottingColors)
## Named chr [1:6] "#1B9E77" "#D95F02" "#7570B3" "#E7298A" "#66A61E" ...
## - attr(*, "names")= chr [1:6] "Mel81" "Mel80" "Mel79" "Mel78" ...
str(plottingColors[malignantCellsPatientMetadata])
## Named chr [1:1061] "#1B9E77" "#D95F02" "#1B9E77" "#D95F02" "#1B9E77" ...
## - attr(*, "names")= chr [1:1061] "Mel81" "Mel80" "Mel81" "Mel80" ...
plot(results$x[,'PC1'],results$x[,'PC2'],main='PCA of malignant cells',col=plottingColors[malignantCellsPatientMetadata],pch=19,xlab='PC 1',ylab='PC 2')
legend('bottomright',legend=unique(malignantCellsPatientMetadata),fill=plottingColors)
results2D=Rtsne(malignantCellsExpression,dims=2,perplexity=50,theta=0)
plot(results2D$Y,main='tSNE of malignant cells, p=50',col=plottingColors[malignantCellsPatientMetadata],pch=19,xlab='tSNE Component 1',ylab='tSNE Component 2')
legend('topright',legend=unique(malignantCellsPatientMetadata),fill=plottingColors)
results.n=prcomp(nonMalignantCellsExpression)
plot(results.n$x[,'PC1'],results.n$x[,'PC2'],main='PCA of non-malignant cells',pch=19,xlab='PC 1',ylab='PC 2')
plottingColors=brewer.pal(length(unique(tumorLabels)),'Dark2')
names(plottingColors)=unique(tumorLabels)
plottingColorsImmune=brewer.pal(length(unique(immuneLabels)),'Dark2')
names(plottingColorsImmune)=unique(immuneLabels)
plot(results.n$x[,'PC1'],results.n$x[,'PC2'],main='PCA of non-malignant cells',col=plottingColorsImmune[immuneLabels],pch=19,xlab='PC 1',ylab='PC 2')
legend('bottomright',legend=unique(immuneLabels),fill=plottingColorsImmune)
# cluster by patients
results2D.n=Rtsne(nonMalignantCellsExpression,dims=2,perplexity=50,theta=0)
plot(results2D.n$Y,main='tSNE of non-malignant cells, p=50',col=plottingColors[tumorLabels],pch=19,xlab='tSNE Component 1',ylab='tSNE Component 2')
legend('topright',legend=unique(tumorLabels),fill=plottingColors)
# cluster by cell type
results2D.n=Rtsne(nonMalignantCellsExpression,dims=2,perplexity=50,theta=0)
plot(results2D.n$Y,main='tSNE of non-malignant cells, p=50',col=plottingColorsImmune[immuneLabels],pch=19,xlab='tSNE Component 1',ylab='tSNE Component 2')
legend('topright',legend=unique(immuneLabels),fill=plottingColorsImmune, bty='n')
results2D.n=Rtsne(nonMalignantCellsExpression,dims=2,perplexity=10,theta=0)
plot(results2D.n$Y,main='tSNE of non-malignant cells, p=10',col=plottingColorsImmune[immuneLabels],pch=19,xlab='tSNE Component 1',ylab='tSNE Component 2')
legend('topright',legend=unique(immuneLabels),fill=plottingColorsImmune, bty='n')
results2D.n=Rtsne(nonMalignantCellsExpression,dims=2,perplexity=100,theta=0)
plot(results2D.n$Y,main='tSNE of non-malignant cells, p=100',col=plottingColorsImmune[immuneLabels],pch=19,xlab='tSNE Component 1',ylab='tSNE Component 2')
legend('topright',legend=unique(immuneLabels),fill=plottingColorsImmune, bty='n')
## 3.4. Using Seurat
nmce <- CreateSeuratObject(counts = t(nonMalignantCellsExpression), project = "nmce", min.cells = 3, min.features = 200)
nmce <- AddMetaData(nmce,immuneLabels,col.name="Immune")
FeatureScatter(nmce, feature1 = "nCount_RNA", feature2 = "nFeature_RNA",ylim=c(0,5000))
nmce <- subset(nmce, subset = nFeature_RNA > 300 & nFeature_RNA < 1500)
nmce <- FindVariableFeatures(nmce, selection.method = "vst", nfeatures = 1000)
top10 <- head(VariableFeatures(nmce), 10)
nmce <- NormalizeData(nmce, normalization.method = "LogNormalize", scale.factor = 10000)
nmce <- ScaleData(nmce, features = rownames(nmce))
## Centering and scaling data matrix
nmce.pca <- RunPCA(nmce, features = VariableFeatures(object = nmce))
## PC_ 1
## Positive: IL32, CD3D, CST7, CD2, SRGN, NKG7, PRKCH, PRF1, GZMA, PDCD1
## IL2RB, TIGIT, ID2, ITM2A, CD8A, CTSD, GZMK, SIRPG, DUSP4, CCL4
## HCST, KLRK1, PRDM1, CCL4L2, CCL4L1, TOX, LINC00152, PYHIN1, GIMAP7, CD7
## Negative: MTRNR2L2, MTRNR2L8, IRF8, SELL, CCR7, HLA-DQB1, RPL3, HLA-DRA, LTB, CD55
## HLA-DQA2, HLA-DQA1, FAM65B, CXCR4, ST6GAL1, LY9, GPR183, CD37, TXNIP, RPLP0
## SNX2, EEF1G, MIR4461, RPS21, RPL4, RPL29, PDE4B, RPS13, HLA-DRB5, RPL13A
## PC_ 2
## Positive: CCL5, CD2, CD3D, CD8A, GZMK, CST7, CCL4, NKG7, KLRK1, IL32
## GZMA, PRF1, CCL4L2, CCL4L1, DENND2D, CBLB, SYTL3, PTPRCAP, PDCD1, TOX
## LCK, FYN, LOC100130231, SIRPG, PRKCH, CD7, IL2RB, PYHIN1, PRDM1, ZAP70
## Negative: IRF8, SNX2, GPX1, TUBA1B, IFITM3, NME2, PRDX1, TUBB, PSMB6, ST6GAL1
## CD55, EIF5A, ANXA2, HLA-DQB1, TPM4, TAGLN2, APEX1, WARS, SYNGR2, CTSA
## GDI2, HLA-DRA, POMP, TMEM123, PFKL, CMTM6, RAB1A, SDCBP, EIF4A1, EIF3L
## PC_ 3
## Positive: HLA-DRB1, HLA-DRB5, HLA-DQA1, HLA-DPA1, HLA-DRA, HLA-DQA2, IRF8, HLA-DQB1, PTPRCAP, RAC2
## CD37, HLA-DMA, CXCR4, RGS2, PTPN6, CD53, CD69, UCP2, RPS4Y1, TAGAP
## LSP1, INPP5D, CD74, PTK2B, CORO1A, PARP15, NR4A2, PARVG, RHOH, SASH3
## Negative: C1orf56, XIST, IFITM3, IL7R, LGALS1, S100A11, ANXA1, ANXA2, S100A10, SERINC5
## IFI6, S100A6, VIM, GIMAP4, HSPB1, PBXIP1, TCF7, MIR4461, ITGB1, STOM
## CD63, TMEM123, PRMT2, RPS27L, IL32, AES, GIMAP7, ANXA5, WTAP, SAR1A
## PC_ 4
## Positive: HLA-DRB1, HLA-DRB5, HLA-DPA1, CD74, CCL4L2, CCL4L1, HLA-DQA1, CCL4, NKG7, HLA-DRA
## CD8A, HLA-DQA2, PRF1, CD63, HAVCR2, KLRK1, GZMA, HLA-DMA, CST7, HLA-DQB1
## WARS, MTRNR2L8, GSTP1, MTRNR2L2, APOBEC3G, CCL5, SYNGR2, DUSP4, CTSD, HSPB1
## Negative: IL7R, TCF7, SERINC5, CCR7, SPOCK2, DGKA, CD6, TMEM123, LTB, TC2N
## ITK, FAM65B, SAMHD1, SELL, ZAP70, GPR183, LDHB, LAT, LEPROTL1, PIK3IP1
## CD69, FAM102A, CD48, JUNB, PBXIP1, CYTIP, STAT4, CYTH1, TSC22D3, CD247
## PC_ 5
## Positive: EMP3, CD52, CORO1A, LCP1, ARHGDIB, FAM65B, GBP1, XIST, MIR4461, S100A4
## SELPLG, PIM2, C1orf56, HCLS1, IFI6, VIM, GBP5, CCND2, CD37, RHOF
## PSMB10, HLA-DRA, CD48, LTB, TRAF3IP3, RAC2, ALOX5AP, AES, GNG2, ANXA1
## Negative: TSPYL2, HSPA1A, RGS2, DNAJB1, JUN, DUSP1, MTRNR2L8, DNAJA1, MTRNR2L2, FOS
## FKBP5, RPS4Y1, CBLB, HSP90AB1, PRKCH, HSPE1, PIK3IP1, TUBA1A, NFAT5, NFKBIA
## CREM, PPP1R15A, ZNF331, ZBTB38, HSPA8, HSPB1, RNF19A, NXF1, HIF1A, TOX
VizDimLoadings(nmce.pca, dims = 1:2, reduction = "pca")
pdf('nm_seurat_pca_immune.pdf')
DimPlot(nmce.pca, reduction = "pca",group.by="Immune")
dev.off()
## quartz_off_screen
## 2
nmce.clust <- FindNeighbors(nmce.pca, dims = 1:10)
## Computing nearest neighbor graph
## Computing SNN
nmce.clust <- FindClusters(nmce.clust, resolution = 0.5)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 1042
## Number of edges: 33042
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8205
## Number of communities: 6
## Elapsed time: 0 seconds
nmce.clust <- RunUMAP(nmce.clust, dims = 1:10)
pdf('nm_seurat_umap_immune.pdf')
DimPlot(nmce.clust, reduction = "umap",group.by="Immune")
dev.off()
## quartz_off_screen
## 2
nmce.clust <- RunTSNE(nmce.clust, dims = 1:10)
pdf('nm_seurat_tsne_immune.pdf')
DimPlot(nmce.clust, reduction = "tsne",group.by="Immune")
dev.off()
## quartz_off_screen
## 2
pdf('nm_seurat_pca.pdf')
DimPlot(nmce.clust, reduction = "pca")
dev.off()
## quartz_off_screen
## 2
cluster0.markers <- FindMarkers(nmce.clust, ident.1 = 0, min.pct = 0.25)
cluster1.markers <- FindMarkers(nmce.clust, ident.1 = 1, min.pct = 0.25)
cluster2.markers <- FindMarkers(nmce.clust, ident.1 = 2, min.pct = 0.25)
cluster3.markers <- FindMarkers(nmce.clust, ident.1 = 3, min.pct = 0.25)
cluster4.markers <- FindMarkers(nmce.clust, ident.1 = 4, min.pct = 0.25)
cluster5.markers <- FindMarkers(nmce.clust, ident.1 = 5, min.pct = 0.25)
head(cluster0.markers, n = 5)
## p_val avg_logFC pct.1 pct.2 p_val_adj
## NKG7 3.454119e-127 1.669612 0.928 0.141 6.908238e-124
## CD8A 1.491073e-111 1.611743 0.888 0.192 2.982146e-108
## PRF1 3.589568e-108 1.540983 0.862 0.169 7.179136e-105
## CST7 1.705780e-100 1.291854 0.943 0.219 3.411560e-97
## GZMA 6.625120e-98 1.448867 0.842 0.128 1.325024e-94
head(cluster1.markers, n = 5)
## p_val avg_logFC pct.1 pct.2 p_val_adj
## IL7R 4.238558e-72 1.363104 0.764 0.228 8.477116e-69
## HLA-DRB5 5.378437e-45 -1.208904 0.193 0.714 1.075687e-41
## HLA-DPA1 4.120055e-41 -1.063219 0.295 0.749 8.240109e-38
## HLA-DRB1 7.702127e-40 -1.012377 0.283 0.761 1.540425e-36
## HLA-DQA1 9.884618e-39 -1.425495 0.087 0.570 1.976924e-35
head(cluster2.markers, n = 5)
## p_val avg_logFC pct.1 pct.2 p_val_adj
## IRF8 2.515263e-141 1.9819463 0.906 0.116 5.030527e-138
## HLA-DRA 7.246820e-120 1.3475207 1.000 0.398 1.449364e-116
## CD74 2.683786e-107 0.5661465 1.000 0.902 5.367572e-104
## HLA-DQB1 3.552468e-98 1.3194694 0.944 0.309 7.104936e-95
## HLA-DPA1 4.985555e-98 0.9439924 0.996 0.535 9.971110e-95
head(cluster3.markers, n = 5)
## p_val avg_logFC pct.1 pct.2 p_val_adj
## ITK 1.036214e-22 0.7150541 0.770 0.337 2.072427e-19
## IL6ST 2.818038e-22 0.7435661 0.791 0.458 5.636075e-19
## TOX 6.642883e-22 0.6785674 0.813 0.364 1.328577e-18
## SH2D1A 8.967137e-21 0.7542328 0.662 0.267 1.793427e-17
## RNF19A 3.346206e-20 0.6156584 0.863 0.512 6.692412e-17
head(cluster4.markers, n = 5)
## p_val avg_logFC pct.1 pct.2 p_val_adj
## IFITM3 1.083594e-38 1.875376 1.000 0.271 2.167188e-35
## ANXA2 4.700911e-34 1.417533 1.000 0.301 9.401822e-31
## CD63 3.485409e-25 1.173200 0.975 0.332 6.970818e-22
## PTPRC 5.076272e-25 -2.173559 0.125 0.992 1.015254e-21
## CXCR4 3.812010e-23 -2.537301 0.050 0.932 7.624020e-20
head(cluster5.markers, n = 5)
## p_val avg_logFC pct.1 pct.2 p_val_adj
## GPX1 2.674287e-21 1.4131941 0.963 0.312 5.348575e-18
## PSAP 3.504584e-19 1.0876540 1.000 0.572 7.009167e-16
## CD68 1.135705e-17 1.6452605 1.000 0.547 2.271411e-14
## FTL 5.850214e-17 0.5489939 1.000 0.929 1.170043e-13
## SAT1 1.223098e-16 0.8263940 1.000 0.672 2.446196e-13